User:John Cummings/Archive/Dataimporthub
Data import hub This page is a hub to organise importing data from external sources.
To request a data import please see the section below, the basic process of a dataset being imported is:
Why import data into Wikidata. |
Request a data import
[edit]Instructions for data importers
[edit]Workflow
[edit]Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Manual work needed: |
Date complete:
Notes: |
Discussion:
[edit]Imported data sets
[edit]Please click here for a list of previously imported data sets
MIS Quarterly Articles information
[edit]Name of dataset: MIS Quarterly Articles information Source: http://www.misq.org/roles/ Link: http://www.misq.org/roles/ Description: MISQ is the highest impact factor journal in Information Systems. I would like to import its articels information on wikidata Request by: Mahdimoqri (talk) 22:13, 12 January 2017 (UTC)mahdimoqri
SCOGS (Select Committee on GRAS Substances), Generally recognised as safe database
[edit]Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|
Name: SCOGS (Select Committee on GRAS Substances), Generally recognised as safe.
Source: FDA Link: https://www.accessdata.fda.gov/scripts/fdcc/cfc/XMLService.cfc?method=downloadxls&set=SCOGS Description: FDA Allowed dietary supplements. |
Link:
Done: https://docs.google.com/spreadsheets/d/1-6PkozVUm_8dKxPDqs8M71Me0oNA4m-h8UjY1qZkheU/edit?usp=sharing To do: Notes: Added fields :
|
Structure:
or each item : add : instance of (P31) :instance of
Field :
Example item: Done: https://www.wikidata.org/wiki/Q132298
To do: Cannot use new properties directly. |
Done:
Data formatted. To do: Notes: |
Done:
To do: Manual work needed: |
Date complete:
Notes: |
Mdupont (talk) 23:59, 1 July 2017 (UTC)
Global disease burden data from IHME institute
[edit]Workflow
[edit]Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|
Name:
IHME Global Burden of Disease Study 2016 Source: Institute for Health Metrics and Evaluation (IHME), at the University of Washington Link: [1] Description: IHME produces global and country-specific estimates of disease burden (i.e. years of healthy lives lost due to death or disease). The estimates of disease burden for different diseases would be valuable in understanding their relative importance in the world. Property disease burden (P2854) can be used to link a disease to a respective estimate in DALYs. |
Link:
Google drive folder Google sheet for data Done: To do: Notes: The diseases should be linked to existing disease items in Wikidata. Is there a list of diseases per ICD10 code? |
Structure:
Example item: laryngeal cancer (Q852423) Done: To do: |
Done:
To do: Notes: |
Done:
To do: Manual work needed: |
Date complete:
Notes: |
Discussion:
[edit]Notified participants of WikiProject Medicine
How do I actually link the names of the disease (in the data) to the disease items (in Wikidata)? --Jtuom (talk) 14:49, 20 December 2017 (UTC)
- @Jtuom: I would write a seperate script that maps the names to the Wikidata IDs. It makes the process much more painless to first check if the mapping works well. --Tobias1984 (talk) 19:26, 20 December 2017 (UTC)
- @Tobias1984: Thanks for the advice. However, I assume that when you say "I would write" you don't mean that you would actually write such script yourself. I'd love to do it but I don't know how. So far, I have managed to create my first SPARQL script by imitating existing scripts [2]. However, the sensitivity and specificity of that script is very poor and it cannot be used to map the diseases I need for this data import. I'd like to try a script that takes each disease name from my data and searches for that from Wikipedia and returns the respective item number from Wikidata -- but I have no idea how that could be done. There are maybe 180 diseases on the list, so it could be done in half a day by hand, but there are probably better solutions. Can someone help? --Jtuom (talk) 13:25, 22 December 2017 (UTC)
Import a spreadsheet myself?
[edit]Hello, I've prepared a spreadsheet to import the names of the winners of the tennis Swiss Open from 2000 to 2017. I see this as a test before I start importing more sports data. Is there a way I can import this file myself or do I need to use the Import Hub ? Here is the file for your review: https://docs.google.com/spreadsheets/d/1sTwCwyo6n-xPlWjk3xT2DmKUoYKOxkjpHsa6-0_kYIM/edit?usp=sharing Wallerstein-WD (talk) 22:21, 30 May 2018 (UTC)
TheRightDoctors
[edit]Name of dataset: TheRightDoctors Source: Internet Link: www.therightdoctors.com Description: Insights from the World's Best Medical Minds. Connect With Us to Connect With Them. We are a Digital Health Google LaunchPad Start-up. Request by: Dr. Chandra Shekar
https://www.dropbox.com/s/oy4bdvtq6dav7b5/books_at_moma.xlsx?dl=0
Liste historische Kantonsräte des Kantons Zürich
[edit]Workflow
[edit]Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|
Name: Mitglieder des Kantonsrats des Kantons Zürich
Source: Kanton Zürich, Direktion der Justiz und des Innern, Wahlen & Abstimmungen: https://wahlen-abstimmungen.zh.ch/internet/justiz_inneres/wahlen-abstimmungen/de/wahlen/krdaten_staatsarchiv/datenexporthinweise.html
|
Link: [3]
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Manual work needed: |
Date complete:
Notes: |
Thist uzh (talk) 07:12, 2 August 2018 (UTC)
Kerala Flood Data
[edit]Hi Team,
I would like to upload the verified and validated Data related to Kerala Flood.
Censo-guía of Archives
[edit]Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|
Name: Censo-guía de Archivos de España e Iberoamérica
Source: Censo-guía de Archivos de España e Iberoamérica Link: Directorio - Property ID in Wikidata (Censo-guía is an authority control for WD). Description: The Censo-guía de Archivos de España e Iberoamérica was created by Law 16/1985 (25th June), Law of "Patrimonio Histórico Español". In its article 51 determines that "la Administración del Estado, en colaboración con las demás Administraciones competentes, confeccionará el Censo de los bienes integrantes del Patrimonio documental". The censo-guía was later expanded to include institutions from Iberoamerica. The Censo Guía functions as a control tool and a communications tool about the archives that exist in Iberoamerica. |
Link: History; Overview of the Censo-Guía content
Done: To do: Notes: |
Structure: Fields used for the spreadsheet can be found here; this can be expanded to be run througout the 44k XML entries and the XML schema of the censo-guía can be found here (overview) and here (schema).
Example item: Done: To do: |
Done:
To do: Notes: The spreadsheet with the total registries can be found here. |
Done:
To do: Manual work needed: I'm not sure if attributes (?): repositorarea; lengthshelf & repositorycode exist in Wikidata. Repository code is quite an important one. |
Date complete:
Notes: |
Discussion
[edit]To clarify, this is the first time I'm trying to do such an import. I've downloaded the around 45k registries from the censo-guia in XML format and a friend helped me to convert the XML format into an csv file. I can iterate over those 45k registries to include any other information that might be relevant according to the schema (notice, however, that they don't necessarily have all the fields completed in the XML files). I'm also able to work on improving the data that's currently in the spreadsheet, like removing "()", changing the names of the archives that have uppercase, and so on. But I'd welcome any instructions on how to improve this dataset so it can be succesfully imported into Wikidata. Scann (talk) 16:22, 26 August 2018 (UTC)
Adresse et géolocalisation des établissements d'enseignement du premier et second degrés
[edit]- Name of dataset: Adresse et géolocalisation des établissements d'enseignement du premier et second degrés
- Source: Éducation Nationale de la république française
- Link: https://www.data.gouv.fr/fr/datasets/adresse-et-geolocalisation-des-etablissements-denseignement-du-premier-et-second-degres/
- Description: Liste géolocalisée des établissements d'enseignement des premier et second degrés, des structures administratives de l'éducation du ministère de l'éducation nationale. Secteurs public et privé.
- Request by: Psychoslave (talk) 12:04, 10 September 2018 (UTC)
Workflow
[edit]Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|
|
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Manual work needed: |
Date complete:
Notes: |
Discussion:
[edit]The cat and fiddle clock Hobart Tasmania Australia
[edit]Modern electronics and an old English melody brought this nursery rhyme to life. This focal piece of the cat and fiddle arcade was constructed by Gregory Weeding (a talented young ambitious local who had studied electronics in Melbourne) Charles Davis owner of a department store of the same name had decided to have an arcade and fountain and felt it needed a clock.
The melody played by a glockenspiel and vibraphone, was recorded in Melbourne and the musicians had to keep playing it again and again, until they took thirty seconds exactly – the time taken by the animated rhyme, with its cat, fiddle, dog, dish, spoon and cow to run its cycle. The clock strikes the hour and – hey, diddle, diddle – the children stand entranced as the cow jumps over the moon. It happens every day at the Cat & Fiddle Square on the hour from 8 am to 11 pm 7 days a week. In sequence the cat plays his fiddle, the cow jumps over the moon, the little dog laughs and there is a cheeky cameo by the dish and spoon. It has bought pleasure to onlookers since 1962. https://m.youtube.com/watch?v=maeZndy7g8c
SSLF city & housing
[edit]A real estate company in ekkatuthangal,chennai. The company was registered in TNRERA and got star category from ISO 9001:2015. The only real estate company in Chennai got authorized trademark from the central government of India. The company started at October 3rd,2007 by Dr.G.Sakthivel. It holds the name which the company has the largest site in tamil nadu.